Comparative transcriptome and weighted correlation network analyses reveal candidate genes involved in chlorogenic acid biosynthesis in sweet potato

Chlorogenic acids (CGAs) are important secondary metabolites produced in sweet potato. However, the mechanisms of their biosynthesis and regulation remain unclear. To identify potential genes involved in CGA biosynthesis, analysis of the dynamic changes in CGA components and RNA sequencing were performed on young leaves (YL), mature leaves (ML), young stems (YS), mature stems (MS) and storage roots (SR). Accordingly, we found that the accumulation of six CGA components varied among the different tissues and developmental stages, with YS and YL recording the highest levels, while SR exhibited low levels. Moreover, the transcriptome analysis yielded 59,287 unigenes, 3,767 of which were related to secondary-metabolite pathways. The differentially expressed genes (DEGs) were identified based on CGA content levels by comparing the different samples, including ML vs. YL, MS vs. YS, SR vs. YL and SR vs. YS. A total of 501 common DEGs were identified, and these were mainly implicated in the secondary metabolites biosynthesis. Additionally, eight co-expressed gene modules were identified following weighted gene co-expression network analysis, while genes in darkgrey module were highly associated with CGA accumulation. Darkgrey module analysis revealed that 12 unigenes encoding crucial enzymes (PAL, 4CL, C4H, C3H and HCT/HQT) and 42 unigenes encoding transcription factors (MYB, bHLH, WD40, WRKY, ERF, MADS, GARS, bZIP and zinc finger protein) had similar expression patterns with change trends of CGAs, suggesting their potential roles in CGA metabolism. Our findings provide new insights into the biosynthesis and regulatory mechanisms of CGA pathway, and will inform future efforts to build a genetically improve sweet potato through the breeding of high CGA content varieties.


Results
Chlorogenic acid content and composition analysis in sweet potato tissues. Six CGAs in different tissues and developmental stages were quantified and analyzed by HPLC to evaluate the dynamic accumulation of CGA in sweet potato (Fig. 1A). All the six CGAs were detected at distinct levels in different tissues and developmental stages (Fig. 1B). Concomitantly, 5-O-caffeoylquinicacid (5-CQA) accumulation was highest in YL with a maximum of ~ 2.62 mg/g dry weight, while the remaining CQAs, including 3-caeoylqunic acid (3-CQA), 4-caeoylquinicacid (4-CQA), 3, 4-di-O-caffeoylquinic acid (3, 4-diCQA), 3, 5-di-O-caffeoylquinic acid (3, 5-diCQA) and 4, 5-di-O-caffeoylquinic acid (4,, were the most abundant in YS. Notably, the six CGAs were less abundant in the storage roots; only 0.11% ~ 0.70% of the CGAs were present in YL and YS. Transcriptome sequencing and assembly. Comparative transcriptome analysis of the YL, YS, ML, MS, and SR was performed in triplicates through RNA sequencing to understand the molecular mechanism of CGA biosynthesis in sweet potato and identify the associated genes. After eliminating the adapter and low-quality reads, a total of 99.92 Gb clean data was obtained from 15 sample, and each sample yielded up to 5.73 Gb of the data. The GC content of the 15 samples were 46.32∼48.25%, with an average of 46.99%, and the Q30 base percentage in each sample was more than 92.12%. The mapped ratio of each sample ranged from 73.45% to 77.24% (Supplementary Table S1). The assembly yielded 167,860 transcripts with an average length and an N50 value of 1,340 bp and 1,889 bp, respectively. After the processing by Trinity and TGICL software, 167,860 transcripts were further assembled into 59,287 unigenes, with an average length of 1,067 bp and an N50 value of 1,720 bp (Table 1).

Functional annotation and KEGG characterization.
A total of 40,781 unigenes (68.78%) were annotated by searching against public databases. Further, the unigenes matched to NR, COG, GO, KEGG, KOG, Pfam, Swissprot and eggnog databases were 39,784 (97.56%), 13 Table S2). The KEGG pathway-based analysis helps understand the biological functions of genes. Accordingly, 14,930 unigenes (25.18%) were mapped to 130 KEGG pathways, among which 3,578 genes were found to be associated with metabolic pathways. Moreover, 1,180 out of the 3,578 genes were mapped to 32 pathways involved in secondary biosynthesis and metabolism (Supplementary Table S3). The phenylpropanoid biosynthesis pathway (ko00940) was the largest group, containing 230 unigenes, followed by ascorbate and aldarate metabolism (ko00053, 86 unigenes) and phenylalanine metabolism (ko00360, 79 unigenes). CGA is a form of phenylpropanoid natural products in plants, and formed via the phenylpropanoid biosynthesis pathway. Thus, identifying and characterizing unigenes involved in phenylpropanoid biosynthesis pathways will help us better understand CGA biosynthesis in sweet potatoes.

Identification of differentially expressed genes (DEGs).
The gene expression levels were evaluated using fragments per kilobase of exon per million fragments mapped (FPKM) values. Furthermore, correlation analysis revealed that the 15 samples could be divided into five groups and that the replicates had a strong positive correlation (r > 0.89), suggesting high reproducibility and reliability of the transcriptome data ( Fig. 2A and Supplementary Table S4). The sample transcriptome data exhibiting noticeable differences in CGA accumulation were compared to identify candidate genes involved in CGA accumulation in sweet potato. In total, 10,458 DEGs were obtained by comparing the five samples in pairs. For the pairs, ML vs. YL, MS vs. YS, SR vs. YL and SR vs. YS, the differentially expressed unigenes were 2,518, 4920, 5,731 and 4,440, respectively (Fig. 2B). Specifically, the genes were up-regulated more than they were down-regulated. Additionally, 501 common unigenes were differentially expressed across the four comparison categories. The specific DEGs in ML vs. YL, MS vs. YS, SR vs. YL, and SR vs. YS were 577, 1,611, 2,307, and 754, respectively (Fig. 2C, Supplementary Table S5 and Supplementary Fig. S1).
Functional GO and KEGG enrichment analyses were performed to evaluate the functional categories of the 501 common DEGs. Functional GO analysis showed that 314 common DEGs were categorized into 31 functional groups, and these were further divided into three categories: biological process, cellular component and molecular function (Supplementary Fig. S2 and Supplementary Table S6). For the biological processes classification, the most abundant groups were metabolic processes (181), cellular processes (147), and single-organism processes (136). The most abundant groups within the molecular function category were catalytic activity (197) and binding (121). Finally, the highest categories were membrane (138), followed by cell (155) and cell part (155) in the cellular components category. Meanwhile, KEGG enrichment results showed that common DEGs were mainly enriched in phenylpropanoid biosynthesis, flavonoid biosynthesis and phenylalanine metabolism (Supplementary Fig. S3 and Supplementary Table S7), indicating that these metabolic pathways may be highly correlated to CGA accumulation in sweet potatoes.
Candidate genes involved in the chlorogenic acid biosynthesis pathway. CGA biosynthesis has been proposed to occur via three alternative routes (Fig. 3A). A total of 56 potential genes, including eight PAL, twenty-five 4CL, three C4H, thirteen HCT/HQT, four C3H, and three UGCT encoding putative enzymes involved in the CGA biosynthesis, were identified based on transcriptome data ( Table 2 and Supplementary Table S8). However, no gene encoding HCGQT in the second route was detected in transcriptome data. Meanwhile, differential expression analysis showed that 28 unigenes were identified as DEGs ( Fig. 3B and Supplementary  Table S8). Among the DEGs, six PAL (c82584.graph_c0, c91820.graph_c1 c103655.graph_c0, c92674.graph_ c0, c104837.graph_c0 and c91820.graph_c0), two 4CL (c100009.graph_c0 and c102274.graph_c0), two C4H (c98593.graph_c0 and c96439.graph_c0), one C3H (c94695.graph_c0) and two HCT/HQT (c90935.graph_c0, c95351.graph_c0) showed high expression levels in YL and YS, and relatively low expressed levels in SR. These findings were consistent with CGA accumulation (Fig. 3B), suggesting the probable implication of the 28 unigenes in the CGA biosynthesis. Interestingly, there was no correlation between the expression of UGCT genes and CGA accumulation.  www.nature.com/scientificreports/ tigated. Three out of thirteen unigenes were predicted as complete full-length ORFs, and the other partial unigenes were further blasted against the genome databases to obtain full-length sequences. Phylogenetic analysis showed that all characterized BAHD proteins were divided into five distinct clades (I-V) and HCT/HQT proteins belonged to group V ( Fig. 4A), which was similar to similar to results reported previously 8 . Four sequences (c85446.graph_c0, c125397.graph_c0, c97205.graph_c0 and c84565.graph_c0) belonged to clade IIIa and three sequences belonged to clade IIIb (c98234.graph_c1, c76240.graph_c0 and c100061.graph_c1), which were involved in the modification of alkaloid compounds and volatile ester biosynthesis 8 . Sequences of c90935.graph_ c0 (IbHQT1) and c95351.graph_c0 (IbHQT2) clustered together with HQTs in clade Vb, while c99097.graph_c0 (IbHCT1) assembled with HCTs in clade Vb. Sequences of c99097.graph_c0 in clade Vb had a close evolutionary relationship with AtSHT in Arabidopsis thaliana that characterized as a spermidine hydroxycinnamoyl transferase 22 . Multiple alignments of the deduced amino acid sequences indicated that IbHQT1, IbHQT2 and IbHCT1 had typical structural characteristics of HCT/HQT proteins, exhibiting the conserved HXXXD and DFGWG motifs (Fig. 4B). Therefore, IbHQT1, IbHQT2, and IbHCT1 are postulated to be the HQT/HCT genes in sweet potatoes.
To investigate the correlation between the CQA content and HQT/HCT expression levels, we measured the expression profiles of IbHQT1, IbHQT2 and IbHCT1 using qRT-PCR. The results indicated that the IbHQT1 transcript levels in different tissues and developmental stages were significantly higher than those of IbHQT2 and IbHCT1 (Fig. 4C), and showed a significant correlation with CQA content, probably enhancing CQA biosynthesis.

Candidate transcription factors involved in chlorogenic acid biosynthesis. Plant phenylpropa-
noid metabolism is often regulated by transcription factors targeting the structural genes encoding enzymes. A gene co-expression network was constructed by WGCNA and compared with CQA contents in different tissues and developmental stages to screen the candidate transcription factors regulating CQA accumulation in sweet potato. In total, 9,408 DEGs were divided into eight distinct co-expression modules based on the similarity of expression profiles (Fig. 5A). Among the modules, the turquoise module was found to be highly positively related to 5-CQA accumulation (0.95); while the correlation coefficient of the darkgrey module and that of with   5B). This finding suggested that the darkgrey module was highly correlated with the CGA accumulation in sweet potatoes. Further analysis showed that darkgrey module contains 12 genes required for the CQA biosynthesis pathway. The genes included six PALs (c82584.graph_c0, c91820.graph_c1 c103655.graph_c0, c92674.graph_c0, c104837.graph_c0 and c91820.graph_c0), one 4CL (c102274.graph_c0), two C4Hs (c98593.graph_c0 and c96439. graph_c0), one C3H (c94695.graph_c0) and one HCT/HQT (c90935.graph_c0), further substantiating the positive correlation between the darkgrey module and CQA biosynthesis. Among the 730 genes of darkgrey module, 42 were regulatory transcription factors, including nine MYBs, eleven bHLHs, seven zinc finger proteins (ZFPs), six bZIPs, three ERFs, two GARSs, two MADSs, one WRKY and one WD40 (Supplementary Table S10). These  www.nature.com/scientificreports/ transcription factors showed similar expression patterns with those of the genes encoding CQA biosynthesis enzymes (Fig. 5C), suggesting they may participate in regulating CGA biosynthesis in sweet potato.
Validation of DEGs via qRT-PCR Analysis. Ten DEGs associated with CGA biosynthesis were chosen for qRT-PCR assay to validate the reliability of gene expression data obtained from RNA-seq. As a result, all the 10 genes exhibited a similar pattern as those displayed by the FPKM values (R 2 > 0.9) (Fig. 6). Therefore, the RNA-Seq results might provide reliable data for further research on CGA biosynthesis and its regulation in sweet potato.

Discussion
Chlorogenic acids (CGAs) are widely distributed in sweet potato, contributing to biotic and abiotic stress resistance and nutritional benefits 18,20,23 . To date, most studies on sweet potato CGA mainly focus on extraction methods, and functional activities. However, very little is known about CGA biosynthesis mechanisms and their regulation in sweet potatoes 21 . Although CGAs have been reported in sweet potatoes, previous studies have not quantified and characterized CGA content in different tissues and different developmental stages. In this study, CGAs were found to be most abundant in young leaves and young stems, whereas much lower levels were detected in storage roots. Therefore, this study attempted to determine the potential genes responsible for CGA biosynthesis through comparative transcriptome combined with CGA content analyses in different tissues and developmental stages. Three possible routes have been reported for the CGA biosynthesis ( Fig. 2A). Putative genes encoding the key enzymes associated with the first and third routes of CGA biosynthesis were identified in the assembly. Phenylalanine ammonia-lyase (PAL), the first rate-limiting enzyme of the phenylpropanoid metabolism, has been www.nature.com/scientificreports/ demonstrated to play a crucial role in CGA synthesis [24][25][26][27] . Furthermore, we found that all the six differentially expressed PAL genes exhibited similar expression patterns that positively correlated changes in with CGA levels, suggesting an essential role of PAL in CGA accumulation. Recently, it has been confirmed that IbPAL1 (c82584. graph_c0) overexpression in sweet potato significantly enhanced CGA accumulation 20 . C4H and 4CL (the early enzymes of phenylpropanoid metabolism) in CGA biosynthesis are yet to be understood 27 . However, our findings showed that two C4H genes (c98593.graph_c0 and c96439.graph_c0) and one 4CL (c102274.graph_c0) exhibited a similar expression pattern with CGA concentrations, implying that they may be involved in CGA biosynthesis. The function of C3H in CGA biosynthesis has been determined by enzyme assays and gene overexpression in various plants [28][29][30] . Similarly, this study, expression of a C3H gene (c94695.graph_c0) was correlated with CQA content, which may be involved in CGA biosynthesis. Additionally, HQT/HCTs have been shown to play vital roles in CGA biosynthesis 9,31 . In the present study, three out of 13 potential HQT/HCT sequences (HbHQT1, HbHQT2 and HbHCT1) presented HXXXD and DFGWG conserved motifs, and were identified as HCT/HQT proteins. Among the three HCT/HQT genes, HbHQT1 exhibited relatively high abundance in different tissues and developmental stages than HbHQT2 and HbHCT1. Moreover, HbHQT1 expression significantly correlated with www.nature.com/scientificreports/ CQA content, suggesting its implication in CGA biosynthesis. Although the three genes encoding UDP-glucose: cinnamate glucosyltransferase (UGCT) in the second route of CGA biosynthesis were identified, their expression did not correlate with CGA levels in different tissues and developmental stages of sweet potato. HCGQT, catalyzing caffeoyl-D-glucose and quinic acid to form CGA, was postulated to be the key enzyme of the second CGA biosynthesis pathway 6 . However, no corresponding HCGQT gene has been identified in plants to date. In addition, chlorogenic acid: glucaric acid caffeoyltransferase (CQT) in tomato was shown to catalyze the transfer of caffeic acid from CGA to glucaric and galactaric acids, indicating a possible CGA recycling route in plants 32 . Thus, the first and third routes may be the main CGA biosynthesis pathways in sweet potato. Nonetheless, more research is needed to understand the significance of the second CGA biosynthesis route. Definite roles of the transcription factors in phenylpropanoid biosynthetic pathway regulation have been well established; however, less is known about the transcription factors regulating CGA biosynthesis 10,33 . Previous works have shown that the transcription factors involved in flavonoids biosynthesis might also regulate CGA biosynthesis 34,35 . Following the DEG and WGCNA analyses, we identified 42 transcription factor genes from nine families (MYB, bHLH, WD40, WRKY, ERF, GRAS, MADS, ZFP and bZIP) in the darkgrey module that might be involved in CGA biosynthesis (Supplementary Table S10 and Fig. 5C). These genes had a similar expression pattern across the different tissues and developmental stages and were co-expressed with CGA biosynthesis-related structural genes. Among the genes, c89342.graph_c0 was homologous to the Arabidopsis AtPAP1/AtMYB75 gene encoding an MYB transcription factor implicated in anthocyanin biosynthesis. In addition, overexpression of AtPAP1/AtMYB75 in Platycodon grandiflorum and Leonurus Sibiricuscan increased CGA content 34,35 . Conversely, c93698.graph_c0 showed high similarity to the Arabidopsis AtMYB12, which was identified as a flavonol-specific transcriptional activator in Arabidopsis. Arabidopsis gene AtMYB12 also regulated CGA biosynthesis in tomato [36][37][38] . Moreover, bHLH (c104813.graph_c0, encoding GLABRA 3) and bZIP (c95202.graph_c3, encoding HY5) genes were detected by darkgrey module, their homologous were considered to be key transcription factors controlling anthocyanin biosynthesis 39,40 . In addition, LjbZIP8 could act as a transcriptional repressor in regulating PAL2 expression and CGA content in Lonicera japonica 41 . Recent studies have revealed that the WRKY transcription factors acting as HCT2 activators in poplar might also play an important role in CGA biosynthesis 10 . These studies showed the importance of transcription factors in regulating CGA biosynthesis. Although no reports on CGA biosynthesis regulation by WD40, ERF, GRAS, MADS and ZFP transcription factors have been published so far, these transcription factors have been reported to regulate other phenylpropanoids such as flavonoids and lignins [42][43][44][45][46][47][48] , suggesting their possible participation in CGA biosynthesis regulation. Therefore, understanding the specific functions of the 42 transcription factors in regulating sweet potato CGA biosynthesis need further studies.
In summary, comparative transcriptome and CGA changes in different sweet potato tissues and developmental stages were systematically investigated. The CGA accumulation varied among the tissues and developmental stages, and was abundant in YL and YS. Moreover, 59,287 unigenes were obtained, 3,767 of which were involved in secondary metabolism pathways. A darkgrey module associated with CGA accumulation was identified by differential expression analysis and WGCNA. In this module, 12 unigenes encoding crucial enzymes (PAL, 4CL, C4H, C3H and HCT/HQT) and 42 unigenes encoding transcription factors (MYB, bHLH, WD40, WRKY, ERF, MADS, GARS, bZIP and zinc finger protein) in darkgrey module may play essential roles in CGA biosynthesis. These results provide a basis for future research on the biosynthesis and regulation of the CGA metabolism pathway in sweet potatoes. www.nature.com/scientificreports/ Methods Plant materials. Sweet potatoes (QS80-12-11) used in this study were grown on a plantation in Hainan Academy of Agricultural Sciences, Hainan, China. The samples from different tissues and developmental stages (young leaves, mature leaves, young stems, mature stems, and storage roots) were collected at 90 days after planting for RNA extraction and CGA content analysis (Fig. 1A). Each sample was collected from at least three individual plants, and all experiments were conducted in triplicate. A portion of the samples was immediately frozen in liquid nitrogen, and stored at -80 °C until RNA extraction. The other samples portion was dried in a blast drier (ZHICHENG, Shanghai, China) at 60 °C and ground for CGA measurement analysis.
Estimation of chlorogenic acid content. The ground samples (0.2 g) were mixed with 1.5 ml of ethanol (70%, v/v) and then incubated in a water bath at 60 °C for 1 h. The mixture was then centrifuged at 5,000 g rpm for 10 min. Thereafter, the supernatant liquid was blown dry using nitrogen, re-dissolved in 1 ml methanol, and then filtered through 0.22 μm membrane filter for CGA analysis. Additionally, HPLC analysis was performed using an  58 . In addition, the gene expression level was calculated and normalized to fragments per kilobases of transcript per million fragments mapped (FPKM) values. Differentially expressed genes (DEGs) from different samples were identified by the DESeq2 software (version 1.12.4) based on the criteria of log2 |Fold Change|> 2, FDR < 0.001 and FPKM values > 2. Nonetheless, co-expression analysis was performed using the "Weighted Correlation Network Analysis (WGCNA)" package in R 59 . The data used in WGCNA analysis were components of the six CGA types and total CGA (sum of the six CGA components) compared with all RNA-seq genes.

Bioinformatics analysis of HCT/HQT proteins. The open reading frames (ORFs) of HQT/HCT
genes were predicted using the ORF finder (http:// www. ncbi. nlm. nih. gov/), and full length of HCT/HQT unigenes with incomplete ORFs were obtained by scanning the unigenes against the sweet potato genome databases. All candidate HCT/HQT sequences were confirmed by Pfam (http:// pfam. sanger. ac. uk/) and NCBI conserved domains database (CDD) (http:// www. ncbi. nlm. nih. gov/ cdd) databases. Eventually, a phylogenetic tree was established by MEGA 9.0 and Clustal X2.0 based on the candidate HCT/HQT protein sequences from sweet potato and other plants using the Neighbor-Joining method with 1000 bootstrap replicates.
qRT-PCR analysis. Quantitative real-time PCR (qRT-PCR) was employed to validate the reliability of gene expression using the RNA-seq data. The assay was performed in a Real-Time System Thermocycler using FastFire qPCR PreMix (Tiangen, Beijing, China). The conditions of the qRT-PCR amplification were as follows: initial denaturation at 95 °C for 3 min, followed by 40 cycles for 10 s at 95 °C, 15 s at 58 °C and 20 s at 72 °C. All reactions were performed in triplicates with specific primers (Supplementary Table S11), and β-actin was used as the reference gene. The relative gene expression level was calculated using the delta-delta Ct (2 −ΔΔCT ) method 60 .
Statement on plant guidelines. Collection of plant material complies with relevant institutional, national, and international guidelines and legislation.